We launch EVA, a vision-centric foundation model to explore the limits of visual representation at scale using only publicly accessible data. EVA is a vanilla ViT pre-trained to reconstruct the masked out image-text aligned vision features conditioned on visible image patches. Via this pretext task, we can efficiently scale up EVA to one billion parameters, and sets new records on a broad range of representative vision downstream tasks, such as image recognition, video action recognition, object detection, instance segmentation and semantic segmentation without heavy supervised training. Moreover, we observe quantitative changes in scaling EVA result in qualitative changes in transfer learning performance that are not present in other models. For instance, EVA takes a great leap in the challenging large vocabulary instance segmentation task: our model achieves almost the same state-of-the-art performance on LVISv1.0 dataset with over a thousand categories and COCO dataset with only eighty categories. Beyond a pure vision encoder, EVA can also serve as a vision-centric, multi-modal pivot to connect images and text. We find initializing the vision tower of a giant CLIP from EVA can greatly stabilize the training and outperform the training from scratch counterpart with much fewer samples and less compute, providing a new direction for scaling up and accelerating the costly training of multi-modal foundation models. To facilitate future research, we release all the code and models at https://github.com/baaivision/EVA.
translated by 谷歌翻译
由于预计不断增长的3D视觉应用程序将为用户提供具有成本效益和高质量的体验,因此人们非常强调点云的视觉质量。回顾点云质量评估(PCQA)方法的开发,通常通过使用单模式信息,即从2D投影或3D点云中提取的视觉质量进行评估。 2D投影包含丰富的纹理和语义信息,但高度依赖于观点,而3D点云对几何变形更敏感,并且对观点不变。因此,为了利用点云和投影图像模式的优势,我们提出了一种新型的无引用点云质量评估(NR-PCQA),以多模式方式进行。在具体上,我们将点云分为子模型,以表示局部几何变形,例如点移和下采样。然后,我们将点云渲染为2D图像投影,以进行纹理特征提取。为了实现目标,子模型和投影图像由基于点和基于图像的神经网络编码。最后,使用对称的跨模式注意来融合多模式质量意识的信息。实验结果表明,我们的方法的表现都优于所有最新方法,并且远远超过了先前的NR-PCQA方法,这突出了所提出方法的有效性。
translated by 谷歌翻译
利用图像生成模型的最新进展,现有的可控面图像合成方法能够生成具有某些可控性的高保真图像,例如控制生成的面部图像的形状,表达,纹理和姿势。但是,这些方法集中在2D图像生成模型上,这些模型容易在大表达和姿势变化下产生不一致的面部图像。在本文中,我们提出了一个新的基于NERF的条件3D面部合成框架,该框架可以通过从3D脸先进的3D面部施加显式3D条件来对生成的面部图像进行3D可控性。其核心是有条件的生成占用场(CGOF),可有效地强制生成的面部形状,以使其对给定的3D形态模型(3DMM)网格进行。为了准确控制合成图像的细粒3D面部形状,我们还将3D地标损耗以及体积翘曲损失纳入我们的合成算法中。实验验证了所提出的方法的有效性,该方法能够生成高保真的面部图像,并显示出比基于2D的最新可控制的面部合成方法更精确的3D可控性。在https://keqiangsun.github.io/projects/cgof上查找代码和演示。
translated by 谷歌翻译
由于空间分辨率的巨大改进,4K内容可以为消费者提供更严肃的视觉体验。但是,由于分辨率扩大和特定的扭曲,现有的盲图质量评估(BIQA)方法不适合原始和升级的4K内容物。在本文中,我们提出了一个针对4K内容的深度学习的BIQA模型,一方面可以识别True和pseudo 4K内容,另一方面可以评估其感知视觉质量。考虑到高空间分辨率可以代表更丰富的高频信息的特征,我们首先提出了基于灰色级别的共发生矩阵(GLCM)的纹理复杂度度量,以从4K图像中选择三个代表性图像贴片,这可以减少计算复杂性,被证明对通过实验的总体质量预测非常有效。然后,我们从卷积神经网络(CNN)的中间层中提取不同种类的视觉特征,并将它们集成到质量感知的特征表示中。最后,使用两个多层感知(MLP)网络用于将质量感知功能映射到类概率和每个贴片的质量分数中。总体质量指数是通过平均贴片结果汇总获得的。提出的模型通过多任务学习方式进行了训练,我们引入了不确定性原理,以平衡分类和回归任务的损失。实验结果表明,所提出的模型的表现均优于所有4K内容质量评估数据库中的BIQA指标。
translated by 谷歌翻译
多代理行为建模旨在了解代理之间发生的交互。我们从行为神经科学,Caltech鼠标社交交互(CALMS21)数据集中提供了一个多代理数据集。我们的数据集由社交交互的轨迹数据组成,从标准居民入侵者测定中自由行为小鼠的视频记录。为了帮助加速行为研究,CALMS21数据集提供基准,以评估三种设置中自动行为分类方法的性能:(1)用于培训由单个注释器的所有注释,(2)用于风格转移以进行学习互动在特定有限培训数据的新行为学习的行为定义和(3)的注释差异。 DataSet由600万个未标记的追踪姿势的交互小鼠组成,以及超过100万帧,具有跟踪的姿势和相应的帧级行为注释。我们的数据集的挑战是能够使用标记和未标记的跟踪数据准确地对行为进行分类,以及能够概括新设置。
translated by 谷歌翻译
Hybrid unmanned aerial vehicles (UAVs) integrate the efficient forward flight of fixed-wing and vertical takeoff and landing (VTOL) capabilities of multicopter UAVs. This paper presents the modeling, control and simulation of a new type of hybrid micro-small UAVs, coined as lifting-wing quadcopters. The airframe orientation of the lifting wing needs to tilt a specific angle often within $ 45$ degrees, neither nearly $ 90$ nor approximately $ 0$ degrees. Compared with some convertiplane and tail-sitter UAVs, the lifting-wing quadcopter has a highly reliable structure, robust wind resistance, low cruise speed and reliable transition flight, making it potential to work fully-autonomous outdoor or some confined airspace indoor. In the modeling part, forces and moments generated by both lifting wing and rotors are considered. Based on the established model, a unified controller for the full flight phase is designed. The controller has the capability of uniformly treating the hovering and forward flight, and enables a continuous transition between two modes, depending on the velocity command. What is more, by taking rotor thrust and aerodynamic force under consideration simultaneously, a control allocation based on optimization is utilized to realize cooperative control for energy saving. Finally, comprehensive Hardware-In-the-Loop (HIL) simulations are performed to verify the advantages of the designed aircraft and the proposed controller.
translated by 谷歌翻译
Graph Neural Networks (GNNs), originally proposed for node classification, have also motivated many recent works on edge prediction (a.k.a., link prediction). However, existing methods lack elaborate design regarding the distinctions between two tasks that have been frequently overlooked: (i) edges only constitute the topology in the node classification task but can be used as both the topology and the supervisions (i.e., labels) in the edge prediction task; (ii) the node classification makes prediction over each individual node, while the edge prediction is determinated by each pair of nodes. To this end, we propose a novel edge prediction paradigm named Edge-aware Message PassIng neuRal nEtworks (EMPIRE). Concretely, we first introduce an edge splitting technique to specify use of each edge where each edge is solely used as either the topology or the supervision (named as topology edge or supervision edge). We then develop a new message passing mechanism that generates the messages to source nodes (through topology edges) being aware of target nodes (through supervision edges). In order to emphasize the differences between pairs connected by supervision edges and pairs unconnected, we further weight the messages to highlight the relative ones that can reflect the differences. In addition, we design a novel negative node-pair sampling trick that efficiently samples 'hard' negative instances in the supervision instances, and can significantly improve the performance. Experimental results verify that the proposed method can significantly outperform existing state-of-the-art models regarding the edge prediction task on multiple homogeneous and heterogeneous graph datasets.
translated by 谷歌翻译
In this paper, we propose an effective unified control law for accurately tracking agile trajectories for lifting-wing quadcopters with different installation angles, which have the capability of vertical takeoff and landing (VTOL) as well as high-speed cruise flight. First, we derive a differential flatness transform for the lifting-wing dynamics with a nonlinear model under coordinated turn condition. To increase the tracking performance on agile trajectories, the proposed controller incorporates the state and input variables calculated from differential flatness as feedforward. In particular, the jerk, the 3-order derivative of the trajectory, is converted into angular velocity as a feedforward item, which significantly improves the system bandwidth. At the same time, feedback and feedforward outputs are combined to deal with external disturbances and model mismatch. The control algorithm has been thoroughly evaluated in the outdoor flight tests, which show that it can achieve accurate trajectory tracking.
translated by 谷歌翻译
For guiding the UAV swarm to pass through narrow openings, a trapezoid virtual tube is designed in our previous work. In this paper, we generalize its application range to the condition that there exist obstacles inside the trapezoid virtual tube and UAVs have strict speed constraints. First, a distributed vector field controller is proposed for the trapezoid virtual tube with no obstacle inside. The relationship between the trapezoid virtual tube and the speed constraints is also presented. Then, a switching logic for the obstacle avoidance is put forward. The key point is to divide the trapezoid virtual tube containing obstacles into several sub trapezoid virtual tubes with no obstacle inside. Formal analyses and proofs are made to show that all UAVs are able to pass through the trapezoid virtual tube safely. Besides, the effectiveness of the proposed method is validated by numerical simulations and real experiments.
translated by 谷歌翻译
Reinforcement learning-based (RL-based) energy management strategy (EMS) is considered a promising solution for the energy management of electric vehicles with multiple power sources. It has been shown to outperform conventional methods in energy management problems regarding energy-saving and real-time performance. However, previous studies have not systematically examined the essential elements of RL-based EMS. This paper presents an empirical analysis of RL-based EMS in a Plug-in Hybrid Electric Vehicle (PHEV) and Fuel Cell Electric Vehicle (FCEV). The empirical analysis is developed in four aspects: algorithm, perception and decision granularity, hyperparameters, and reward function. The results show that the Off-policy algorithm effectively develops a more fuel-efficient solution within the complete driving cycle compared with other algorithms. Improving the perception and decision granularity does not produce a more desirable energy-saving solution but better balances battery power and fuel consumption. The equivalent energy optimization objective based on the instantaneous state of charge (SOC) variation is parameter sensitive and can help RL-EMSs to achieve more efficient energy-cost strategies.
translated by 谷歌翻译